A note on scrambled Halton sequences
نویسنده
چکیده
Halton’s low discrepancy sequence is still very popular in spite of its shortcomings with respect to the correlation between points of two-dimensional projections for large dimensions. As a remedy, several types of scrambling and/or randomizations for this sequence have been proposed. We examine empirically some of these by calculating their L4and L2-discrepancies (D* resp. T*), and by performing integration tests. Most investigated sequence types give practically equivalent results for D*, T*, and the integration error with two exceptions: random shift sequences are in some cases less efficient, and the shuffled Halton sequence is no more efficient than a pseudo-random one. However, the correlation mentioned above can only be broken with digit scrambling methods, even though the average correlation of many randomized sequences tends to zero. Keywords: Quasi-Monte Carlo methods; Low discrepancy sequences; Numerical integration MSC classification codes: 65C05, 65D30, 11K36 Author’s address: Physikalisches Institut, Hermann-Herderstr. 3, D-79104 Germany Author’s e-mail: [email protected] 2 I. Introduction The Halton sequence [6] is still one of the most popular low discrepancy sequences. It can be produced easily and fast with simple algorithms, of which several have been published (e.g. [12], [1], the original [7] is obsolete). Halton sequences are of course not unique but depend on the set of prime numbers taken as bases to construct their vector components. Typically and most efficiently the lowest possible primes are used. Criticism against the use of the Halton sequence instead of the more recent ones of Sobol [25], Niederrreiter [15], or Niederreiter-Xing [17] comes from two sides: First, the asymptotic behavior of its star discrepancy D* % C(s) log(N)/N (1) has a coefficient C(s) which increases exponentially with dimension s [e.g. 13], and, second, twodimensional orthogonal projections of vector components with large primes as basis (which are needed if the dimension is large) show a high correlation between coordinate pairs for sequences, whose length N is not large compared to the product of the two primes. This behavior, i.e. that the projections onto dimensions j and k concentrate along the diagonal xj = xk and parallels to it, has also been dubbed “bad coverage” (i.e. of the projected subspace) [8]. The question, which low discrepancy sequence is the “best” for quasi-Monte Carlo integration, has, unfortunately, no unambiguous answer. Theoretically, one can define parameters of uniformity [see e.g. 16], of which the most prominent are the L4or star discrepancy D*, and the L2or root mean square discrepancy T*. D* appears in an often cited rigorous error limit, the Koksma-Hlawka inequality, but its calculation is infeasible even for moderate values of s and N, while for T*, which can be calculated in a wider parameter range, no similar bound exists. Experimentally, a lot of integration tests (and tests in more complex applications) have been performed (e.g. [14], [26], [29], [22]), however most of them are not very conclusive because the authors do not show the heavily fluctuating (!) behavior of the integration error g(N) 3 as function of the number of trials. Only the smoothed behavior, i.e. the trend of errors as function of N can be taken as meaningful. We come back to the arguments against the use of Halton sequences. The first one is void, because it is meanwhile clear [21] that now and in the foreseeable future we are far from the asymptotic region of eq. (1) in practical quasi-Monte Carlo problems. On the contrary, computations of D* [21], T*, and errors in integration tests [22] show that Halton sequences (not taking unnecessarily large primes as bases), Sobol, and Niederreiter sequences show very similar behavior of D*(N), T*(N) and g(N), and there are many examples, in which Halton sequences lead to smaller errors than, e.g., Niederreiter ones. The correlation problem is a more serious one under two conditions: (a) the number N of trial points is not large compared with the product of the two largest prime bases of the Halton sequence (generally p(s) and p(s-1), where p(k) is the kth prime), and (b) the integrand itself is highly correlated in the selected dimensions, which is often the case in simulation problems. This is now the point where “scrambling” enters the stage. The idea was first proposed by Braaten and Weller [3]. The kth component of the Halton sequence is defined as the van der Corput sequence to prime b = p(k), i.e. Nk(n) = dk /b + dk /b + dk /b + ..., (2) where the dk are the digits in the base b representation of n. For finite n, the set {d} is also finite, and it can be shown that it is allowed to permute this set (leaving only d fixed) by a permutation B which is fixed for the whole sequence n = 1...N but different for different dimensions k = 1 ... s. I.e. we have now Rk(n) = Bk(dk) /b + Bk(dk)/b + Bk(dk)/b + ... (3) 4 The scrambled or generalized Halton sequence is then composed from these components R(n) instead of the originalN(n). Remains the question which permutations to take. In Ref. [3] in lack of a better method these were obtained by minimizing T* for an increasing number of dimensions leaving the earlier permutations fixed. This leads to a table of permutations, given there up to s = 16, and extendable with moderate effort [28]. Similar tables have been used in Ref. [27]. A very special permutation simply reversing the list {dk, dk, ..., dk } was recently proposed [28]. Other permutations are discussed below. Recently under the label of “shuffled” Halton sequences another way of breaking the above mentioned correlations was promoted [8], which had already been proposed in [13]. Here, the whole Halton sequence is first produced for a given length N, and then all of its vector components Nk(n), n = 1 ... N, are randomly permuted by different permutations. As we will discuss below in Section VI, we doubt that this is still a low discrepancy sequence for s > 1. As a remedy for the non-existence of practical error estimates randomized low discrepancy sequences were introduced [19]. Applied to the Halton sequence this means that several random scramblings are applied to the sequence and the average outcome of using them is taken as result. This allows error estimates, which, instead of strict but useless bounds, are now useful though statistical error estimates similar to those available for pseudo-Monte Carlo integration. There are many different ways of scrambling the Halton sequence, some of which have been discussed in Ref. [29]. As a contribution to this discussion, we present here computations of D* and T* and average and rms errors g (N) and F(N) from integration tests for several scrambled and randomized Halton sequences. As a compromise with regard to our computer resources, D* was computed only in 4 dimensions, which then allows to reach N about 2000. T* was limited here to between 10 and 10, since the correlation problem discussed above is more interesting for 5 small than for large N. Some integral tests with randomized Halton sequences have also been published in Ref. [29]. As a general result we find that the dependence of D*, T*, and integration errors on the method of scrambling and/or randomizing the Halton sequence is very small, so that other considerations, e.g. the ease of implementation could play a larger role in its choice. II. The investigated sequences We list here the sequences investigated in this work and the code letters used for them: (H) the original Halton sequence started from N = 1 (but see remark below). In addition to this single sequence, we used also consecutive pieces of length N from a single Halton sequence as replicates from which standard deviations could be estimated. In Ref. [18] this has been called “internal replication”. We have used this procedure since Ref. [1] for all integrations. When we compute discrepancies we code it as H5. Furthermore: (B) is the scrambled Halton sequence of Ref. [3] (this is a special case of digit scrambling), (R) the reverse Halton sequence of [28], (F) the randomly shifted Halton sequence [26, 29]. (V, Y and Z) are different implementations of the random-start Halton sequence [29], (D) the digit-scrambled Halton sequence mentioned in Ref. [29], and (M) the shuffled Halton sequence of [13], heavily promoted in [8], which we treat in Section VI. In cases F to M above one obtains random instances (“replicates” or “repetitions”) of the scrambled sequence, which allow the computation of statistical parameters. For the discrepancies we generally used five such repetitions, for the integration tests 10 or 1000. As usual, we take the smallest available primes as bases. The implementation of most sequences is trivial, and the reader can consult the references. For digit scrambling we mention that we kept the digit 0 unscrambled (as in [3], not mentioned in [29]). For the computation of D* all sequences were 6 computed im Mathematica and sent to a Fortran routine. An implementation note for the random start sequence is given in the Appendix. III. L4-discrepancies D* of some Halton sequences In this section we show results on the L4-discrepancy for various modified Halton sequences in s = 4 dimensions. This choice is a compromise between the wish for high dimension and the limits on N by the available computer resources. As for other sequences [21] after a short initial region (N .10), which is necessary to keep D* <1.0, D* decreases with a power law, whose exponent is approximately -0.7 here, increasing very slowly with N. Fig. 1 shows the three non-randomized sequences (H, B, and R). Their difference is visible, though not important, especially since a practitioner should anyhow use several different sequences for error control. Fig. 2 shows an example of five instances of the digit scrambled sequence plus a shifted plot of the internally replicated sequence. One observes a striking similarity. Note that the true fluctuations are larger than seen in the figures, since for N > 200 we computed D*(N) only for selected values of N. Fig. 3 shows how near the D*(N) of the randomized sequences (F, V, and D) are to each other. The rms ratio of any pair of data sets differs from 1.0 by less than 10%, whereas the maximum observed absolute deviation from 1 of the ratio of any pair due to fluctuations is < 30%. It also shows that the shuffled sequence (M) has a larger D* than the others, and decays much slower (cf. Section VI). Fig. 4 shows that the three different implementations of the random start Halton sequence are practically identical. Only their fluctuations occur at different N. IV. L2-discrepancies T* of some Halton sequences In contrast to L4-discrepancies D*, the L2-discrepancies T (for arbitrary axis-parallel hyper-rectangles) and T* (for axis-parallel hyper-rectangles containing the origin as one corner) 7 can be calculated to much higher N. We did that for s = 4 and 12 for N between 10 and 10. Here, we concentrate on T*, for which the inequality T* < D* holds, and which has often been used to assess the quality of different sequences (e.g. [3], [26], [11], [10]). The typical situation for small dimension (here s = 4) is shown in Fig. 5: With the exception of the first few points of some sequences (including standard H, which is not shown), and of the shuffled sequence (M), the values of T* stay below the expectation for a random sequence (straight line). But for N . 30 they are not too far from random behavior, and only then turn into a steeper decay. For N > 100, T* of sequences B (not shown), D, H5 and V is practically the same, while that of F is somewhat higher, and the shuffled sequence M stays always near the random line. Except M, the log-log slope is slowly increasing, for H5, B, D, and V from about -0.75 to about -0.85. Such increase is expected since T* < D*, and D* is asymptotically O(log N/N) with a slope approaching -1. But note that the crossover of the asymptotic D* and the random expectation of T* is very far out, e.g. for s = 4 at N = 10, and for s = 12 already at 10. Fig. 6 shows T* of five instances of the digit scrambled sequence (D, shifted upwards) and of the internally replicated sequence H5. The differences between the replicates, about 30% of the value, are generally not larger than the fluctuations. Fig. 7 shows for dimension 12 that sequences M and more so H begin with a T* much larger than the random expectation. But from N = 10 onwards the curves for H, F, and V drop fast below the random expectation, while that for M stays approximately on it, supporting our view that shuffling destroys the low discrepancy of the sequence. Again, H5 is very similar to D, however the fluctuations are much bigger, which one may expect from the “holes” in the replicates produced by starting them with N > 1. 8 V. Integration tests with randomized Halton sequences These were done similar to those of our former paper [22], where the test functions and the details of the integration procedure have been described. However, we have used narrower N-steps here. The simple test functions used have the advantage, that their variance, variation and effective dimension could be computed. For s = 12 the latter is approximately 11 for TF2 and TF3, 3 for TF1, 1.5 for TF4 and TF12, and near 1 for TF13 and TF14. Test functions TF1 and TF12 -TF14 are equivalent to functions used in Ref. [29], where, however, the integrations were done only for a few N # 2. In this section we concentrate on methods H (replicated!), F, V and M. The number of replications was 10 or 1000. The use of internal replication for H, which appears to us quite natural, could also be thought of as a kind of “poor man’s random start scrambling”. Since we must show the integration errors as log-log plots, we use the mean absolute error g(N) besides the rms error σ(N) in our figures. The fluctuations of g(N) are very large (and would appear still larger, had we computed g for every N), therefore we present only one example in Fig. 8, from which one can also see that the rms error of a single replicate (i.e. not divided by o(N–1)) is a good probabilistic measure for the mean absolute error, even in case H, where that has not been proven. Moreover, the fluctuations of the absolute error , of ten different integrations with methods H, V (and also F, not shown) are comparable to or even larger than the differences between sequences. This situation is very much the same for s = 4 and s = 12. As Figs. 9 to 11 show, the rms error of a single replicate is practically the same if computed from 10 or 1000 replicates (as it should), and practically independent of the scrambling method. This statement has, however, two exceptions: (a) with some test functions (e.g. TF4 for s = 12) the error from using the random shift sequence (F) drops more slowly than H and V, and (b) that of the shuffled sequence (M) is always parallel and near to the pseudo9 random expectation. This different behavior of F and M has also been found for D* (cf. Fig. 3) and T* (cf. Fig. 5). Our main conclusion is that for not too small N the quality of integration with (replicated pieces of) the unmodified Halton sequence or the different scrambled versions of it (perhaps with the exception of random shift scrambling) is essentially the same in view of the fluctuations of even an averaged integration error. Since the fluctuation maxima and minima for different replicates occur at different values of N, different scramblings should always be used as a consistency check, especially if one intends to do the integration with a fixed number N. VI. Remarks on the shuffled Halton sequence In 1994 Morokoff and Caflish [13] proposed to scramble the Halton sequence by using separate random permutations of the vector components for each given N. They expressed their hope that the discrepancy of such sequence might be smaller than that of a pseudo-random one. (But note that it is no longer an extensible sequence, to which one can add additional points without re-computing the old ones. It needs also a large memory for large N.) Recently, this kind of scrambling has been strongly promoted under the name “shuffled Halton sequence” by Hess and Polak [8], and then also used by others in applications (e.g. [2], [5], [30]). But, while it is true that this shuffling solves the problem of “coverage” and of correlation between pairs of dimensions, the superiority of this sequence over a purely pseudo random one has not been convincingly shown. We remark again that, in view of the fluctuations of D*(N), T*(N), and g(N), only (plots of) results for many different values of N show what really happens. In contrast to Ref. [8], we conjecture that the shuffled Halton sequence is no longer a low-discrepancy one, and will not generally perform better than a purely random sequence. In the N-range computed by us both discrepancies D* (Fig. 3) and T* (Figs. 5 and 7) are not essentially different from that of a pseudo-random sequence, and in no way comparable to those 10 of properly scrambled sequences. Similarly, in all integration tests (e.g. Figs. 9 to 11) the shuffled sequence performed worse than regularly scrambled ones, and generally only marginally better than pseudo-Monte Carlo. Especially, the slope of the error plots is never far from the pseudorandom value of -1⁄2. It is well known that one big difference between pseudo-random and quasi-random sequences is, that in the latter there is correlation, i.e. each new point “knows” where the older ones are, while in the completely uncorrelated pseudo-random case this is not so. To illustrate this, we computed for sequences of given length N the distribution of the distance from each point xn to the nearest neighboring point xj, j = 1..N, j n. Fig. 12 shows that standard and Braaten-Weller scrambled Halton sequences have distributions which are very distinct from that of shuffled sequences, which practically coincide here with pseudo-random ones.. This is another indication that the shuffled sequence is more or less a pseudo-random one. We conclude that, if it is important to avoid any correlation between different dimensions of the problem, and if one prefers quasi-Monte Carlo over pseudo-Monte Carlo methods, one should either use digit-scrambled Halton sequences (random shift and random start do not help, though the average correlation of a graet number of randomized replicates is also small), or resort to some other type of low-discrepancy sequence, e.g. Sobol’s or Niederreiter’s ([25], [15], for implementations see [4], [9]). VII. Conclusions We saw that the differences of D*(N) and T*(N) between differently scrambled Halton sequences are small, and those of integration errors could be called negligible. This is in concord with the result of our former papers [21, 22], where we compared sequences of the Halton, Niederreiter and Niederreiter-Xing type, and also did not find the big differences which one 11 could anticipate from the literature. The same is true for Sobol sequences with different direction numbers [23]. What is wrong? Let us examine on what these expectations are based. 1) Theoretical formulas about worst case errors, as discussed in [22]. We showed there, I hope convincingly, that practical computations are neither in the asymptotic regime nor do they, in practice, have errors proportional to worst case errors. 2) Computations of the L2-discrepancy T*, the only one which can feasibly be computed for problems with realistic N and dimensions > 3. However, it turns out that T* is of doubtful value for the assessment of integration errors. 3) Computational tests of more or less practical problems, which show many conflicting results (as e.g. discussed for problems in financial mathematics in Ref. [24]). In view of the large fluctuations of quasi-Monte Carlo integration as function of the number of trials N (see Figs. 8 to 11 of this note, and Figs. 2, 4, and 6 of Ref. [22], where the plotted fluctuations are those of the average of 10 integrations) we suspect that many tests in the literature show just the result of a positive or negative fluctuation. Unfortunately in other papers only fits but not the fluctuating points are shown, and often the full set of parameters (number of replicates, range of the fits) are not given, which makes the assessment of such papers difficult. On the other hand we will not conceal here the limitations of our own work: (a) We used a limited number of test functions, and only rather simple ones. (Note, however, that TF4 is not a simple product function, and that we know the effective dimension of each one used.) (b) Our largest dimension was 24 resp. 12. In financial mathematics 240 might be a more typical number. However, it has also been shown at least in one case ([20], Keister’s integral), that the effective dimension of such problems can also be very small. (c) We concentrated on large N, i.e. 10 to 10, which for not too exotic test functions give integration errors much smaller than 1%. If one is happy with errors of several percent, a further 12 examination of the range N < 1000 should be done. In that case it is essential to investigate also the fluctuations more thoroughly by looking at an increased number (thousands to millions) of replicates. Appendix: Implementation of random start scrambling A little more detail about the implementation of the “random start Halton sequence” of Wang and Hickernell [29], may be useful. It starts with a random point {u1, u2, ... us} in [0,1), and obtains the sequence by repeatedly applying component-wise a von Neumann-Kakutani transform Tb(x), where b = b(j), j = 1..s is the Halton base for component j. It is shown in the reference that the transformed sequence is identical to a standard Halton sequence whose components Nk(n) are started not with nk=1 but with a set of mk > 0, k = 1..s, which can be computed from the uk. It is further shown that this transform preserves the low discrepancy property of the sequence. A closer view shows that Halton numbers are rational numbers, and that the NeumannKakutani transform as described in the reference should start with a random rational in [0,1), and be implemented in rational arithmetic. In this case the set of start numbers mk could be exactly recovered, and a standard Halton implementation used to derive the rest of the sequence. Of course, in production runs this is impractical. So the question is, where to switch from rational to real arithmetic. We have implemented three alternatives: In the first one (code Y) we start with integer random numbers in the arbitrary range 0 #Ri#999999, divide by 10, compute the exact rational Halton sequence, and transform this to real arithmetic for the calculation of D*. In the second one (code Z) we start with real random numbers and compute the sequence in real arithmetic. In this case an arbitrary fixed number of digits for the implementation must be chosen, and the equation k = Floor(-ln(1-x)/ln(b))) of the reference must not be used, since it can be false in real arithmetic. 13Our third implementation (code V) starts the Halton sequence directly with a set ofrandom integers mk, k=1..s, between 1 and a valuebk -1, where v is taken such thatbk justexceeds some fixed mmax , which we took to be 1 000 000. (We tried also 1000 without muchdifference.) These starts are uniformly distributed like the Neumann-Kakutani transformed ones.All versions were implemented in Mathematica, and the computed sequences transferred to ourFortran programs for D* and T*, except for V, which is easily implemented directly in Fortran.Fig. 4 shows as an example, that the differences in D* between the three versions are, indeed,smaller than the fluctuations of D*(N) and the variance of repetitions. 14References[1] M. Berblinger and Ch. Schlier, Monte Carlo integration with quasi-random numbers:some experience, Comput. Phys. Comm. 66 (1991) 157-166[2] J. L. Bowman, Logit kernel (or mixed logit) models for large multidimensional choiceproblems: identification and estimation, http://jbowman.net/papers/B03.pdf[3] E. Braaten, G. Weller, An improved low-discrepancy sequence for multidimensionalquasi Monte Carlo integration, J. Comput. Phys. 33 (1979) 249-258[4] P. Bratley, B. L. Fox and H. Niederreiter, Implementation and Tests of Low-DiscrepancySequences, ACM Trans. Math. Software 2 (1992) 195-213[5] L. Chiou, J. L. Walker, Identification and estimation of mixed logit models under simula-tion methods, http://people.bu.edu/joanw/JW_SimID.pdf[6] J. H. Halton, On the efficiency of certain quasi-random sequences of points in evaluatingmultidimensional integrals, Numer. Math. 2 (1960) 84-90 and 196[7] J. H. Halton, G. B. Smith, Radical inverse quasi-random point sequence, Algorithm 247,Comm. ACM 7 (1964) 701[8] S. Hess and J.W. Polak The shuffled Halton sequence, CTS Working paper (2003), seethe homepage http://www.cts.cv.imperial.ac.uk/StaffPages/StephaneHess/ [9] H. S. Hong and F. J. Hickernell, Implementing scrambled digital sequences, ACM Trans.Math. Software 29 (2003) 95-109[10] P. Jaeckel, Monte Carlo methods in finance, John Wiley and Sons, Chichester, 2002 [11] F. James, J. Hoogland and R. Kleiss, Multidimensional sampling for simulation andintegration: measures, discrepancies, and quasi-random numbers, Comput. Phys. Comm. 99 (1997) 180-220[12] M. Kolar, S. F. O’Shea, Comput. Math. Appl. 25 (1993) 3-13 15[13] W. J. Morokoff and R. E. Caflisch, Quasi-random sequences and their discrepancies,SIAM J. Sci. Comput.15 (1994) 1251-1279[14] W. J. Morokoff and R. E. Caflish, Quasi-Monte Carlo integration, J. Comput. Phys. 122(1995) 218-230[15] H. Niederreiter, Low-Discrepancy and Low-Dispersion Sequences, J. Number Theory 30(1988) 51-70[16] H. Niederreiter, Random number generation and Quasi-Monte Carlo methods, SIAM,Philadelphia, 1992[17] H. Niederreiter, C.P. Xing, Nets, (t,s)-sequences, and algebraic geometry in: P. Helle-kalek and G. Larcher (Eds.), Random and Quasi-Random Point Sets, Lecture Notes inStatistics, Vol. 138, Springer, Berlin 1998, pp. 267-302,[18] A. B. Owen, Monte Carlo variance of scrambled net quadrature, SIAM J. Numer. Anal.34 (1997 1884-1910)[19] A. B. Owen, Scrambling Sobol’ and Niederreiter points, J. Complexity 14 (1998)466-489[20] A. B. Owen, The dimension distribution and quadrature test functions, Statist. Sinica 13(2003) 1-17 [21] Ch. Schlier, Discrepancy behaviour in the non-asymptotic regime, Appl. Numer. Math.50 (2004) 227-238[22] Ch. Schlier, Error trends in Quasi-Monte Carlo integration, Comput. Phys. Comm. 159 (2004) 93-105[23] Ch. Schlier, Unpublished supplement to ref. [22], available as http://phya8.physik.uni-freiburg.de/abt/papers/qrnpap/QRNsuptot.pdf[24] M. E. da Silva, T. Barbe, Quasi-Monte Carlo in finance: Extending for high dimensionalproblems, obtainable from http://www.econ.fea.usp.br/medsilva 16[25] I. M. Sobol', On the distribution of points in a cube and the approximate evaluation ofintegrals, USSR Comput. Maths. Math. Phys. 7 (1967) 86-112[26] B. Tuffin, On the use of low discrepancy sequences in Monte Carlo methods, MonteCarlo Methods Appl. 2 (1996) 295-320[27] B. Tuffin, A new permutation choice in Halton sequences, in: H. Niederreiter, P. Helle-kalek, G. Larcher and P. Zinterhof (eds.), Monte Carlo and Quasi-Monte Carlo Methods1996, Lecture Notes in Statistics, Vol. 127, Springer, Berlin, 1998, pp. 427-435,[28] B. Vandevoestyne, R. Cools, Good permutations for deterministic scrambled Haltonsequences in terms of L2-discrepancy, J. Comput. Appl. Math, accepted[29] X. Wang and F. J. Hickernell, Randomized Halton sequences, Mathematical and Com-puter Modelling 32 (2000) 887-899[30] X. Wang, K. M. Kockelman, Tracking land cover change in a mixed logit model: Recog-nizing temporal and spatial effects,http://www.ce.utaxas.edu/ prof/kockelman/public.html 17Figure captionsFig. 1: D* of the standard (H, red), scrambled after [BW79] (B, brown), and reverse [VC05] (R,green) Halton sequences in four dimensions.Fig.2: D*, s=4: lower: five internal replicates of the Halton sequence, upper: five instances of thedigit scrambled Halton sequence shifted by +0.5. Without shift they differ by less than thefluctuations, which are typically less than 40%.The steps of the computation are 1 for N < 200,then 5 to N \[LessEqual] 400, then 50 to N \[LessEqual] 950, then 100.Fig. 3: Average D* of five replicates of scrambled Halton sequences in 4 dimensions: randomshift (F, green), digit scrambled (D, magenta), simplified random start (V, blue), and shuffled (M,black) in four dimensions. The straight line is a fit to the latter and has slope -0.533.Fig. 4: Average D* of five repetitions of three different implementations of the random startHalton sequence in four dimensions (Y: red, Z: blue, V: green)Fig. 5: T* of Halton sequences in 4 dimensions: H5 (red), B (brown), D (magenta), F (green), V(blue), M (black). Except for B, the average of 5 replicates has been plotted. For small N someT* exceed the random expectation (straight line). For N > 100 the plots of H5, B, D, and V arebarely different, while F is somewhat worse, and M stays near the random line.Fig.6: T*of five replicates of scrambled Halton sequence in 12 dimensions: internal replicated (H5, lower), digit scrambled (D, upper, shifted upwards 0.5). Steps are 1 for N \[LessEqual]100, 50 for N \[LessEqual] 1000, 500 for N \[LessEqual] 10000. The differences between thereplicates are smaller than the fluctuations, which are larger for H5 than D. The straight black lines are the random expectation.Fig. 7: T* of the Halton sequences in 12 dimensions: H (red), D (yellow), F (green), V (cyan), M (black).For the randomized sequences D, F, V, and M the average of 5 repetitions has beenplotted. For small N sequences H and M exceed the random expectation (thin straight black line) 18appreciably. For N > 1000, the plots of H, D, F, and V are barely different, unlike M they dropfast below the random line.Fig. 8: Errors of integration of TF1 in 12 dimensions with sequences H (internal replication, red)and V (random shift, blue). dashed bottom group: mean absolute error for 1000 replications,middle group: 10 replications, top group: rms error for a single integration (thin: 10 reps, thick:1000 reps). For 1000 reps the rms errors of both sequences are within their line thickness. Therms errors of the averages would be 0.5 resp. 1.5 units lower.Fig. 9: Rms errors of integration of TF4 in 12 dimensions with sequences H (red), F (green), V(blue), and M (black). Thin (thick) lines: 10 (1000) replicates. The straight black line is theexpectation for pseudo-random sequences. Note that TF4 has a small effective dimension, soeven sequence M is better than pseudo-random.Fig. 10: Rms errors of integration of TF3 in 4 dimensions with sequences H (red), F (green), V(blue), and M (black). Thin (thick) lines: 10 (1000) replicates. The straight black line is theexpectation for pseudo-random sequences. Note that TF3 has full effective dimension.Fig. 11: Rms errors of integration of TF13 in 4 dimensions with sequences H (red), F (green), V(blue), and M (black). Thin (thick) lines: 10 (1000) replicates. The straight black line is theexpectation for pseudo-random sequences. Note that TF4 has a small effective dimension, so even sequence M is better than pseudo-randomFig. 12: Distribution of the nearest neighbour distances of 1000 points of several sequences in 6dimensions: H (red), R (blue), B (green), pseudo-random (2 instances, black), M (2 instances, violet). Note that M coincides practically with pseudo-random. 00.511.522.53-2-1.5-1-0.50 Fig. 1FiguresCol.nb2/21/061
منابع مشابه
Simulation Estimation of Mixed Discrete Choice Models Using Randomized and Scrambled Halton Sequences
The use of simulation techniques has been increasing in recent years in the transportation and related fields to accommodate flexible and behaviorally realistic structures for analysis of decision processes. This paper proposes a randomized and scrambled version of the Halton sequence for use in simulation estimation of discrete choice models. The scrambling of the Halton sequence is motivated ...
متن کاملOn the Performance of the Shuffled Halton Sequence in the Estimation of Discrete Choice Models
The area of travel demand analysis has in recent years been greatly enriched by the development of new model forms that can accommodate complex patterns of substitution and taste variation. However, this added flexibility comes at a cost of greater complexity in estimation, to the degree that these models need to be estimated through simulation. While basic Monte-Carlo integration can lead to a...
متن کاملOn the use of a Modified Latin Hypercube Sampling (MLHS) method in the estimation of a Mixed Logit Model for vehicle choice
Quasi-random number sequences have been used extensively for many years in the simulation of integrals that do not have a closed-form expression, such as Mixed Logit and Multinomial Probit choice probabilities. Halton sequences are one example of such quasi-random number sequences, and various types of Halton sequences, including standard, scrambled, and shuffled versions, have been proposed an...
متن کاملGood permutations for scrambled Halton sequences in terms of L2-discrepancy
One of the best known low-discrepancy sequences, used by many practitioners, is the Halton sequence. Unfortunately, there seems to exist quite some correlation between the points from the higher dimensions. A possible solution to this problem is the so-called scrambling. In this paper, we give an overview of known scrambling methods, and we propose a new way of scrambling which gives good resul...
متن کاملRandom and Deterministic Digit Permutations of the Halton Sequence*
The Halton sequence is one of the classical low-discrepancy sequences. It is effectively used in numerical integration when the dimension is small, however, for larger dimensions, the uniformity of the sequence quickly degrades. As a remedy, generalized (scrambled) Halton sequences have been introduced by several researchers since the 1970s. In a generalized Halton sequence, the digits of the o...
متن کامل